54 research outputs found

    Clustering of variables with a three-way approach for health sciences

    Get PDF
    © 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)Cluster analysis or classification usually concerns a set of exploratory multivariate data analysis methods and techniques for grouping either a set of statistical data units or the associated set of descriptive variables, into clusters of similar and, hopefully, well separated elements. In this work we refer to an extension of this paradigm to generalized three-way data representations and particularly to classification of interval variables. Such approach appears to be especially useful in large data bases, mostly in a data mining context. A health sciences case study is partially discussed.This research was partially supported by ISAMB (Faculty of Medicine, University of Lisbon, Lisbon) and CEEAplA (University of the Azores, Ponta Delgada, Azores)

    A global Approach to the Comparison of Clustering Results

    Get PDF
    Copyright © 2012 Walter de Gruyter GmbH.The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initialstage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data

    Quality evaluation of a selected partition : An approach based on resampling methods

    Get PDF
    The aim of this work on cluster analysis is to provide a methodology to analyse and assess the quality of a selected partition (the best partition according to several validation indexes). In the proposed approach, the evaluation of the stability and of the consistency of the results of the selected partition (original partition) was done using the comparison between this partition and each of the partitions (with the same number of clusters that the original one) obtained by resampling. A special emphasis is given to an index defined by linear combination of four indicators, which allows evaluating the adjustment between the original partition and each of the partitions (and / or set of obtained partitions) obtained from resampling data. The application of these indexes is exemplified using a set of real data, and the main conclusions are summarized and discussed.CICS.UAc/CICS.NOVA.UAc, UID/SOC/04647/2013, and this paper was produced with support from the FCT/MEC thru National Funds and when applied co-financed by the FEDER within the partnership agreement PT2020.info:eu-repo/semantics/publishedVersio

    Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method

    Get PDF
    This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.The affinity coefficient and its extensions have both been used in hierarchical and non-hierarchical Cluster Analysis. The purpose of the present empirical study on the distribution of the basic and the generalized affinity coefficients and on the distribution of the standardized affinity coefficient, by the method of Wald and Wolfowitz, under different assumptions, is to assess the effect of the statistical probability distributions of the variables (columns) of the initial data matrix, and of the respective parameters, in the distribution of the values of these coefficients. We present some results concerning the asymptotic distribution of the referred coefficients under the assumption that the variables (for which the values of these coefficients are calculated) are independent and have statistical probability distributions specified apriori. In this distributional study, based on the Monte Carlo simulation method, we considered ten well-known statistical probability distributions with different variations of the respective parameters. The simulation studies lead to the conclusion that the coefficients’ convergence for the normal distribution is quite fast and, in general, a good approximation is obtained for small sample sizes, that is for sample sizes above 20 and in many cases for sample sizes above 10

    Cluster Analysis of Business Data

    Get PDF
    This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.In this work, classical as well as probabilistic hierarchical clustering models are used to look for typologies of variables in classical data, typologies of groups of individuals in a classical three-way data table, and typologies of groups of individuals in a symbolic data table. The data are issued from a questionnaire on business area in order to evaluate the quality and satisfaction with the services provided to customers by an automobile company. The Ascendant Hierarchical Cluster Analysis (AHCA) is based, respectively, on the basic affinity coefficient and on extensions of this coefficient for the cases of a classical three-way data table and a symbolic data table, obtained from the weighted generalized affinity coefficient. The probabilistic aggregation criteria used, under the probabilistic approach named VL methodology (V for Validity, L for Linkage), resort essentially to probabilistic notions for the definition of the comparative functions. The validation of the obtained partitions is based on the global statistics of levels (STAT)

    Clustering of Symbolic Data based on Affinity Coefficient: Application to a Real Data Set

    Get PDF
    Copyright © 2013 Walter de Gruyter GmbH.In this paper, we illustrate an application of Ascendant Hierarchical Cluster Analysis (AHCA) to complex data taken from the literature (interval data), based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. The probabilistic aggregation criteria used belong to a parametric family of methods under the probabilistic approach of AHCA, named VL methodology. Finally, we compare the results achieved using our approach with those obtained by other authors

    On clustering interval data with different scales of measures : experimental results

    Get PDF
    This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.Symbolic Data Analysis can be defined as the extension of standard data analysis to more complex data tables. We illustrate the application of the Ascendant Hierarchical Cluster Analysis (AHCA) to a symbolic data set (with a known structure) in the field of the automobile industry (car data set), in which objects are described by variables whose values are intervals of the real data set (interval variables). The AHCA of thirty-three car models, described by eight interval variables (with different scales of measure), was based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. We applied three probabilistic aggregation criteria in the scope of the VL methodology (V for Validity, L for Linkage). Moreover, we compare the achieved results with those obtained by other authors, and with a priori partition into four clusters defined by the category (Utilitarian, Berlina, Sporting and Luxury) to which the car belong. We used the global statistics of levels (STAT) to evaluate the obtained partitions

    Clustering an interval data set : are the main partitions similar to a priori partition?

    Get PDF
    This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.In this paper we compare the best partitions of data units (cities) obtained from different algorithms of Ascendant Hierarchical Cluster Analysis (AHCA) of a well-known data set of the literature on symbolic data analysis (“city temperature interval data set”) with a priori partition of cities given by a panel of human observers. The AHCA was based on the weighted generalised affinity with equal weights, and on the probabilistic coefficient associated with the asymptotic standardized weighted generalized affinity coefficient by the method of Wald and Wolfowitz. These similarity coefficients between elements were combined with three aggregation criteria, one classical, Single Linkage (SL), and the other ones probabilistic, AV1 and AVB, the last ones in the scope of the VL methodology. The evaluation of the partitions in order to find the partitioning that best fits the underlying data was carried out using some validation measures based on the similarity matrices. In general, global satisfactory results have been obtained using our methods, being the best partitions quite close (or even coinciding) with the a priori partition provided by the panel of human observers

    Entrepreneurship Promotion in Higher Education Institutions

    Get PDF
    AbstractsThe importance of entrepreneurship promotion has increased significantly in today's society, especial-ly during periods of crises. This work is based on the responses obtained through a survey conducted on a sample of 305 undergraduates of the University of the Azores, enrolled in different science pro-grams. The aim is to deepen the knowledge of the entrepreneurial propensity of higher education students in the Azores, and in that way the university can stimulate their interest in creating business-es. The main results obtained, using exploratory data analysis (from the univariate to the multivari-ate), are presented and discussed. Research paper Reference to this paper should be made as follows: Sousa, Á., Couto, G., Branco, N., Silva, O., Bace-lar-Nicolau, H. (2017). “Entrepreneurship Promotion in Higher Education Institutions”, Journal of Entrepreneurship, Business and Economics, Vol. 5, No. 1, pp. 157–184

    Symbolic Data Analysis for the Assessment of User Satisfaction: An Application to Reading Rooms Services.

    Get PDF
    Special edition of the European Scientific Journal (ESJ): Conference proceedings: 1st Annual International Interdisciplinary Conference AIIC 2013, 24-26 April, Azores Islands, Portugal.This paper re-examines and deepens the study of a portion of the data collected within the context of a wider 2007 research project conducted in the Autonomous Region of Azores. The 2007 study aimed to understand users’ habits, attitudes and cultural practices, concerning reading and utilization of different library services, archives and museums. Based upon knowledge that only data analysis of a representative sample can supply, the study aimed to identify the aspects that should be prioritized in a process of restructuring the cultural services of leisure and reading to be implemented. This paper, utilizing data from the 2007 study, presents some results from the Ascendant Hierarchical Cluster Analysis (AHCA) of symbolic objects, according to the treatment to which they were submitted. These objects are described by different symbolic attributes pertaining to the latent variable ‘Degree of Satisfaction’. This variable was evaluated according to different dimensions of on-the-spot reading and consultation services. The aggregation criteria used in this study belong to a parametric family of methods and the similarity measure used is the weighted generalized affinity coefficient, for symbolic data. The validation of the clustering results is based on some validation measures
    corecore